Converting Russian Dependency Treebank to Stanford Typed Dependencies Representation

نویسندگان

  • Janna Lipenkova
  • Milan Soucek
چکیده

In this paper, we describe the process of rulebased conversion of Russian dependency treebank into the Stanford dependency (SD) schema. The motivation behind this project is the expansion of the number of languages that have treebank resources available in one consistent annotation schema. Conversion includes creation of Russian-specific SD guidelines, defining conversion rules from the original treebank schema into the SD model and evaluation of the conversion results. The converted treebank becomes part of a multilingual resource for NLP purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank

The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging ...

متن کامل

Linguistic Issues in Language Technology LiLT

This paper presents an ongoing project whose goal is to create a freely available dependency treebank for Persian. The data is taken from the Bijankhan corpus, which is already annotated for parts of speech, and a syntactic dependency annotation based on the Stanford Typed Dependencies is added through a bootstrapping procedure involving the opensource dependency parser MaltParser. We report pr...

متن کامل

Dependency Parsers for Persian

We present two dependency parsers for Persian, MaltParser and MSTParser, trained on the Uppsala PErsian Dependency Treebank. The treebank consists of 1,000 sentences today. Its annotation scheme is based on Stanford Typed Dependencies (STD) extended for Persian with regard to object marking and light verb contructions. The parsers and the treebank are developed simultanously in a bootstrapping ...

متن کامل

A Persian Treebank with Stanford Typed Dependencies

We present the Uppsala Persian Dependency Treebank (UPDT) with a syntactic annotation scheme based on Stanford Typed Dependencies. The treebank consists of 6,000 sentences and 151,671 tokens with an average sentence length of 25 words. The data is from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art, taken from the op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014